VLCDoC: Vision-Language contrastive pre-training model for cross-Modal document classification
نویسندگان
چکیده
Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features a prior into learnable downstream task. In this paper, we approach the classification problem by cross-modal representations through language and vision cues, considering intra- inter-modality relationships. Instead of merging different modalities joint representation space, proposed method exploits high-level interactions learns relevant semantic information effective attention flows within across modalities. The objective is devised between alignment tasks, where similarity distribution per task computed contracting positive sample pairs while simultaneously contrasting negative ones in space. Extensive experiments on public benchmark datasets demonstrate effectiveness generality our model both low-scale large-scale datasets.
منابع مشابه
An Improved Hierarchical Bayesian Model of Language for Document Classification
This paper addresses the fundamental problem of document classification, and we focus attention on classification problems where the classes are mutually exclusive. In the course of the paper we advocate an approximate sampling distribution for word counts in documents, and demonstrate the model’s capacity to outperform both the simple multinomial and more recently proposed extensions on the cl...
متن کاملA link-bridged topic model for cross-domain document classification
0306-4573/$ see front matter 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ipm.2013.05.002 ⇑ Corresponding author at: Department of Computer Science, South China University of Technology, Guangzhou, China. Tel.: +852 39438461; f 26035505. E-mail addresses: [email protected] (P. Yang), [email protected] (W. Gao), [email protected] (Q. Tan), [email protected] (K.-F. Wong)...
متن کاملContrastive utterances make alternatives salient - cross-modal priming evidence
Sentences with contrastive intonation are assumed to presuppose contextual alternatives to the accented elements. Two cross-modal priming experiments tested in Dutch whether such contextual alternatives are automatically available to listeners. Contrastive associates – but not noncontrastive associates were facilitated only when primes were produced in sentences with contrastive intonation, ind...
متن کاملCross-document relationship classification for text summarization
Multiple documents describing the same event present some interesting challenges for natural language processing. They contain similar information and yet they also exhibit a number of interesting properties: paraphrases, partial agreement, difference in judgment and emphasis, and contradictions. When the sources track an event that evolves over time, more phenomena can be observed: additions, ...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition
سال: 2023
ISSN: ['1873-5142', '0031-3203']
DOI: https://doi.org/10.1016/j.patcog.2023.109419